# Image to Text
Vit Gpt2 Image Captioning
Apache-2.0
This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text
V
aryan083
31
0
Bpe Vocab N OCR
Apache-2.0
Bpe-vocab-n-OCR is an advanced text extraction tool based on OCR, optimized for generating structured and tokenized output.
Image-to-Text
Transformers Supports Multiple Languages

B
prithivMLmods
76
4
BLIP Radiology Model
An image-to-text model based on the transformers library, supporting the conversion of image content into descriptive text.
Image-to-Text
Transformers

B
motheecreator
152
0
OCR TextInput Base
A specialized image-to-text model for the financial domain, supporting English text recognition, primarily used for processing image content in financial documents.
Text Recognition
Transformers English

O
rohit5895
31
0
Trocr Base Finetune Numbers
TrOCR is a Transformer-based optical character recognition model designed to extract text content from images.
Image-to-Text
Transformers English

T
ANANDHU-SCT
23
0
Trocr Sinhala
This model is a fine-tuned version of Microsoft's TrOCR printed text model, specifically designed for Sinhala OCR recognition tasks.
Text Recognition
Transformers Other

T
Ransaka
66
1
Ocrmnist
Apache-2.0
An optical character recognition model based on Hugging Face Transformers, specifically designed for recognizing MNIST-style digit images
Text Recognition
Transformers English

O
vanshp123
16
0
Trocr Base Printed Captcha Ocr
A captcha recognition model fine-tuned based on Microsoft's trocr-base-printed model, specifically designed for OCR tasks involving printed text
Text Recognition
Transformers

T
chanelcolgate
33
1
Image Caption Using ViT GPT2
Apache-2.0
This is an image captioning model based on Vision Transformer (ViT) and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text
Transformers

I
Ayansk11
15
1
Trocr Base Fa V2
This is a Transformer-based OCR model specifically designed for recognizing Persian text in images.
Text Recognition Other
T
hezarai
64
3
Manga Ocr Base
Apache-2.0
Optical Character Recognition model specialized for Japanese text in manga
Text Recognition
Transformers Japanese

M
TareHimself
96
1
Donut Base Sroie
MIT
A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated
Text Recognition
Transformers

D
iamkhadke
13
0
Hdd Words Ocr
An OCR model for Hebrew image-to-text conversion, capable of recognizing Hebrew text in images.
Text Recognition
Transformers Other

H
sivan22
25
0
Pix2struct Docvqa Base
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on image-text pairs, supporting various tasks including image captioning and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages

P
google
8,601
37
Donut Base Sroie
MIT
This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.
Text Recognition
Transformers

D
unstructuredio
31
1
Ko Trocr Base Nsmc News Chatbot
MIT
This is a proof-of-concept model for Korean text recognition, trained on the TrOCR architecture, supporting Korean text extraction from images.
Image-to-Text
Transformers Korean

K
daekeun-ml
44
10
Donut Base Sroie
MIT
A document understanding model fine-tuned based on philschmid/donut-base-sroie
Text Recognition
Transformers

D
Prem11100
13
0
Donut Base Medical Handwritten Prescriptions Information Extraction
MIT
A fine-tuned Donut model for extracting text information from handwritten medical prescription images.
Image-to-Text
Transformers

D
mjawadazad2321
71
1
Donut Base Sroie
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks
Text Recognition
Transformers

D
philschmid
185
3
Trocr Base Printed
A branch model based on microsoft/trocr-base-printed, specializing in OCR tasks for printed text.
Text Recognition
T
philschmid
14
2
Doctr Torch Crnn Mobilenet V3 Large French
An optical character recognition (OCR) model based on TensorFlow 2 and PyTorch, supporting multilingual text detection and recognition
Text Recognition
Transformers Supports Multiple Languages

D
Felix92
33
3
Vit Gpt2 Image Captioning
Apache-2.0
This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text
Transformers

V
nlpconnect
939.88k
887
Trocr Base Stage1
TrOCR is a Transformer-based pretrained optical character recognition model developed by Microsoft, suitable for single-line text image OCR tasks.
Image-to-Text
Transformers

T
microsoft
18.74k
13
Vit2distilgpt2
MIT
This is an image-to-text generation model capable of receiving images and outputting descriptive text.
Image-to-Text
Transformers English

V
sachin
49
8
Trocr Small Stage1
TrOCR is a Transformer-based pre-trained optical character recognition model that adopts an encoder-decoder architecture, suitable for OCR tasks on single-line text images.
Image-to-Text
Transformers

T
microsoft
3,713
12
Featured Recommended AI Models